A Distance Function for Data with Missing Values and Its Application

ثبت نشده
چکیده

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods. Keywords—Missing values, Distance metric, Bhattacharyya distance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

Applying the Mahalanobis-Taguchi System to Vehicle Ride

The Mahalanobis Taguchi System is a diagnosis and forecasting method for multivariate data. Mahalanobis distance is a measure based on correlations between the variables and different patterns that can be identified and analyzed with respect to a base or reference group. The Mahalanobis Taguchi System is of interest because of its reported accuracy in forecasting small, correlated data sets. Th...

متن کامل

Performance evaluation of different estimation methods for missing rainfall data

There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...

متن کامل

Video Subject Inpainting: A Posture-Based Method

Despite recent advances in video inpainting techniques, reconstructing large missing regions of a moving subject while its scale changes remains an elusive goal. In this paper, we have introduced a scale-change invariant method for large missing regions to tackle this problem. Using this framework, first the moving foreground is separated from the background and its scale is equalized. Then, a ...

متن کامل

A new attitude coupled with the basic fuzzy thinking to distance between two fuzzy numbers

Fuzzy measures are suitable in analyzing human subjective evaluation processes. Several different strategies have been proposed for distance of fuzzy numbers. The distances introduced for fuzzy numbers can be categorized in two groups:\1. The crisp distances which explain crisp values for the distance between two fuzzy numbers.\2. The fuzzy distance which introduce a fuzzy distance for normal f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013